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Web Content Adaptation P rnr.fiss and System . 

T^nhnical Field - _. k 

The present invention relates to an apparatus and method for. adapting web page 
5 content for display on an intended display device, by splitting the content into a plurality of 
smaller web pages. 

a^^ifq rni inH to the Invention and Prior Art 

To deliver web content to different devices is a process of understanding, re- 
10 structuring and tailoring the content in such a way that the content source , can - be 
understood and delivered to different devices (such as desktop PCs, PDAs, and mobiles 
phones) in a manner which suits the device characteristics. Within the prior art (see for 
example Current Technologies for Device Independent, Mark H Butle, HP Labs Technical 
Report HPL-2001-83 4 April 2001), there are three presently known ways of doing it. 
15 Firstly, web developers/authors can use web page development software to tailor 

the content manually to suit dW^ert devices, at the web content development stage. By . 
doing this, different versions (e.g. HTMUCSS, WML, XMLTXSL) of a single source can be 
created based on the device capabilities. This approach Is the primitive way to deliver web 
content to dWerent devices, and Is a time-consuming and tedious task for web 
20 developers/authors if a large number of versions are required. 

A second more automated approach is to use a proxy-based trans-coding approach 
through a proxy server which dees the adaptation work on the fly when an end user submrts 
an URl link through a HTTP request. This approach Is computing intensive at the proxy 
server and has the result that the system response time is slowed. Furthem^ore, there ,s 
25 no intervention of the original web developers/authors to the adapted web content, which 
may raise legal and copyright issues in some countries. 

A third known technique is to use a client-based (end user device) adaptation 
appK«ch by installing the adaptation system software at the client side. The client-based 
adaptation system will adapt the web content on the fly after it receives the result sent back 
30 by the requested web server. This approach is computing intensive at the client side, which 
will consume and degrade the client processing perfomiance. Again, there is no 
. inten/ention of the original web developers/authors to the adapted web content, which might 
raise legal and copyright Issues as weU. Furthermore, this approach Is not possible lo be 
applied in small mobile devices due to computation power limitations. 
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As part of the adaptation, it may be necessary to split a page of web content into a 
plurality of smaller pages (known as content splitting), in. view of the small size of client 
display devices such as. PDAs and WAP phones. For example, to r-ender the contents of a 
PC web-page (e.g. 800x600 pixels) to a smaller display (for example a PDA with 240x320 
5 pixels) in a readable form, it is desirable to adapt the content to fit into the fewest number of 
smaller pages whilst also trying to minimize the amount of white space. 

Prior known methods for coritent splitting Include those described In US Patent 
Application No. 09/942;051 f published as US 2003/0050931 AD. entitled "System. 
iWetliod and Computer Program Product for Page Rendering Utiiizinq Transcoding" 

10 This document describes a system which is able to. adapt web content for display on 
different viewing devices by splitting the content into multiple pages. The system firstly uses 
the web content to construct a hierarchical tree structure using XML, which is then 
fonnatted for display on the device (eg by changing to a text font which is supported by the 
viewing device, and . replacing redundant information with references to variables). The 

15 formatted structure is then split into multiple pages for output to the viewing device. 
However, whilst such an approach achieves the objective of splitting the content to fit the 
client display device, it does not satisfy the desirable objective of minimizing the amount of 
white space. 

In view of the above, there is a need for a further approach which adapts web page 
20 content for display on an intended display device which does not possess the 
disadvantages of the prior art, and in particular with regards to attempting to minimize the 
amount of white space which will be displayed with the content. 

Summary of the Invention 

25, In order to meet the above, the present invention provides an apparatus and 

method for adapting web page content for display on an intended device. Here, an 
integrated process of splitting combined with transformations are provided in an iterative 
manner to adapt the content display size. Thus allowing the content to be split into a 
suitable number, of smaller web 'pages whilst keeping to a minimum the amount of white 

30 space that will be shown on the pages. In this context it is understood that the tenn 
"transformation" can include, for example, reducing / increasing the size of images / text, 
removal / replacement of content, etc. 

According to a first aspect of the present invention, there is provided an apparatus 
for adapting web page content for display on an Intended display device, comprising 
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adaptation means for splMng the content .mo a pluralrty of smaller web pages for display on 
said device, the adapation means being an^nged in use to: 

(0 spilt the content into a piura,«y of content po*ns, and to Iterativ^y repeat steps 
a) to (vl) for at least one of the content portions; 
■5 Oi) analyse the content to detennine whether the ste of the content portion ,s 

suitable for display on said device; j„rire then 

(iii, if the size of the content portion Is not surtable for display on sa,d devce, then 
aoolv a plurality of transformations to the content portion; 

(iv) analyse the transformed content to determine whether «,e s.e of the 
10 transfomied content portion is suitable for display on said device; and 

(VO « the ^ze of the transfcnned content portion is not suRable for display on sa,d 
device then split the content portion Into a plurality of further content portions. 

According to a second aspect of the present invention, there is provided a method 
for adaXg wfb page content for display on an Intended display device. compns,ng 
" lent into a plurality of smalier web pages for display on said device by ■ 

^"°7s"": into a plurainy of content portions, an. — repea«ng ^ 
steDs (in to (vi) for at least one of the content portions: 
,0 !ii) anllysing the content to detennine whe«,er the size of the content portion is 

suitable for display on said device; , 

(ill, . the sL of the content portion is not su«able for display on said device, then 
applying at least one content transfomiation to the content portion; 
' fiv) analysing the transfomied content to detemnine whether the s^e of the 
25 transformed content portion is suitable for display on said ^-Ice; and 

(Vi) If the size of the transfomied content portion .s not surtable for display 
device then splitting the content portion Into a plurality of further content portions. 

According to a third aspect of the present InvenHon, there is provided a comp^e^ 
30 program or suite of programs so a^nged such that when executed by a -P--V^- 
ley cause/s the system to perton. the method above. ^^^^^.^ 
programs may be embodied by a modulated canler signal lncorpora.ng da^ «=rtespond n 
to t^e computer program or at least one of the suite of pr^rams. for example a signal being 
carried over a network such as the Internet. 

35 
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Additionally, from a yet further aspect \he invention also provides a computer 
readable storage mediunn storing a computer program or at least one of suite of computer 
programs according to the third aspect. The computer readable storage medium may be 
any magnetic, optical, magneto-optical,- solid-state; or other storage medium capable of 
5 being read by a computer. 

Brief Description of the Drawings 

Further features and advantages of the present invention will become apparent 
from the following description of an embodiment thereof, presented by way of example only, 
10 and by reference to the accompanying drawings, wherein like reference numerals refer to 
like parts, and wherein: 

Figure 1 is a system block diagram illustrating the components of the embodiment 
of the invention, and the signal flows therebetween; 

Figure 2 is a process flow diagram illustrating in more detail how information flows 
15 between the components of the embodiment of the invention in operation; 

Figure 3 is a flow diagram for an algorithm to detect characteristics of display 
objects in web content within the embodiment of the invention; 

Figure 4 is a decision tree to detect the functions of display objects in web content 
within the embodiment of the invention; . 
20 Figure 5 Is a flow diagram illustrating how content transfomnations can be applied 

in the embodiment of the invention; and 

Rgure 6 is a flow diagram illustrating how the process of content splitting is 
performed in the embodiment of the invention. 

25 Description of the Embodiment 

An embodiment of the present invention will now be described with reference to 
Figures 1 to 6. 

Figure 1 is a system block diagram of the system provided by the embodiment of 
the invention. This system consists of 8 sub-components, as described next. The full 
30 operation of the system will be.described later. 

Firstly there is provided the client capability discovery module 12. The purpose of 
this module is to discover the end user's device characteristics e.g. type of devices and their 
capabilities such as screen size/resolution supported, processing power etc., and as such 
this module receives information from the end user display device relating to its capabilities. 
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The client capability discovery module 12 passes the end user's device Informatiorvto the 
Decision Module 14. 

The Decision module 14 contains existing Client Capabilities profile Ids which were 
previously detected or predefined by the adaptation system. Client Capabilities profile Ids 
5 are sets of information relating to display device display characteristics. In use the Decision 
module 14 first compares an end user's device characteristics and capabilities (CC) range 
based on the information sent by Discovery Module with the existing capability profiles. If 
the client capabilities match an existing profile, then the profile Id of the matching profile is 
sent to a content Cache 10 , In which is stored different versions of pre-generated adapted 
10 web content. If .the received CC set does not match an existing CC range (i.e. there is no 
existing CC profile which matches the present requesting device capabilities) then the 
Adaptation module 16 will be triggered. Additionally, the adaptation module 16 may also be 
triggered manually to generate different versions of the original web content, without there 
being a specific request from an end-user. 
15 Howsoever the adaptation module 16 is triggered, the module 16 then acts to 

examine the http header of the requested web content, and further acts to control a content 
analysis module 20 to retrieve the requested web content from a web content source store 
22. The content analysis module 20 then acts to analyse the indicated .web content from 
the content source store 22. and passes back to the adaptation module 16 a range of 
20 parameters relating to the characteristics of the web content as input values to the 
adaptation module 16. The input parameters received at the adaptation module 16 from the 
content analysis module 20 enable the adaptation module 16 to adapt the requested web 
content from the original web content stored in the -web contents store 22. The output of the 
adaptation module 16 is therefore an adapted version of the originally requested web 
25 content, and this is sent in an appropriate mark up language such as html to a content 
cache 10 together with a set of client capability (cc) infomaation, being a set of one or more 
characteristics of the display of the device requesting the web content. Such display 
characteristic information was detennined by the client capability discovery module 12, as 
previously described. 

30 In addition to the modules mentioned above, there is also provided, as previously 

mentioned, a content cache 10 which acts to store different adapted versions of the original 
web content. In some embodiments, the cache 10 may also store client capability 
characteristics, being the set of infomaation relating to the display characteristics of different 
client display device. Also provided is the content source 22, in which the original web 

35 content to be adapted Is stored, and a content tidy module 18 which acts under the control 
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of the content analysis module 20 to tidy up the original web content received from the 
content source store 22 prior to the analysis thereof. Furthermore, a customisation module 
24 is also provided which is merely a front end system providing previews of the adapted 
.web content from the content cache. The customisation module 24 is an offline module ■ 
5 which allows the author to preview and further customise adapted content. 

Having described the various system modules provided by the embodiment of the 
present invention, the operation of those modules will now be described in further detail with 
respect to Figures 2 to 6. 

The apparatus provided by the present invention would most likely be embodied in 
10 a computer system acting as a web server or the like. Alternatively, the apparatus can be 
embodied in a computer system, or the components (as shown in figure- 1) can be 
embodied in separate computer systems but one computer system acts as a web server to 
other computer systems. Howsoever the apparatus is embodied, the computer system not 
only acts as a mere server, but also allows for the web content author to develop the web 
15 content and review it thereon. Moreover, the system also acts to generate different 
versions of the original web content for different intended display devices. In view of these 
functions, there are three distinct modes of operation of the system: a first mode wherein it 
is acting to service user requests for web content, the request having been received over a 
network; a second mode wherein it acts to generate adapted versions of the web content for 
20 different intended display devices in advance of the receipt requests from users for such 
web content; and a third mode wherein adaptation of web content to provide a further 
adapted version.can be performed on the fly in response to a user request. Each of these 
modes of operation will now be described. 

Dealing with the second of the above described modes of operation first, imagine 
25 that the system is being used to generate different versions of original web content in 
advance of user requests therefor. For example, this mode could be used during website 
development to provide versions of an original source web content each especially adapted 
for different intended display devices, each with different display characteristics. 

Therefore, when in this mode the first step to be performed is that a plurality of sets 
30 of predefined intended display device display characteristic profiles are 'created, each set 
having a unique ID. and corresponding to a set of one or more display characteristics, each 
characteristic taking a range of values. For example, a first client capability profile set could 
have fields entitled cllentjype screen_resofution, and colour^depth. A first profile ID CC1 
would by way of example have the value "PC" in the cllent.type field, the value "800 x 600" 
35 or "1024 X 768" in the screen.resolution field, and "labits" in the colour_depth field. As a 



. tHe c«e f,e,d, me va,ue -200 x 300- .n ^^JJ^^^Z.r. m rely ncn «in, 
examples 0, the ,peo,*™*nwh,ch can— 

5 large number of dWeren, profies can be eas.h, created by fen. ^ 

co.bina«ons o, device characteristics and ^^^^'^^l^J^ , ™, ,3 .ereiy a 
pro«,e se^s are stcred in a prc«e ^ :r pro.e se^^^ 

database system which acts tc physically store the dev.ce P ^ ^ ^^^^^^ 

accessible by the decision mcduie 14 in ° ^'^^T^t oapabiiity prP«e sets. 
.10 re,uestinguserdispiaydevicecharacterist,cs-ththes^^^^^^ ^ 

Having created a plurality 0, client ^^^^^^^J^n the mode o, operation 
combination o. intended display device charactenst cs an^^^^^^^^^^^ ^ 
presently described the system acts to then genera - ap ^^^^^ 
Lurce web content for each o, the Cent -p^^,^ 51^1^0 adapt me ortglna, source 
« 26. Thisisperfonnedbytrlggerlngmeada^au^^^^^^^^ 

„eb content to match each client capab,lrty profile set . 
usually be triggered separately for each client -P^^^t Copending to a single client 
..gerlng, a single adapted -'-'"^y^^^^^^ST.. adUn module 13 ' 
capability profile set will be generated. The detailed ope 

- -"'^T?,:::::- b, the captation ---- r»ir: 

. ^alys. module .0, by passing 7/--:;::: J::^ rent source from 

adapted. T''- ^^'''^'^ "^ "1 web content sou^ seated by 

tt,e content source store 22 (wh.ch ^'J^J'^^^^ ^ Content Tidying 

developers/authors), ^^^^T^ :::^^^^^ 
module 18 for conversion to an xHTML fite. ^ .^^^ 

: isto.«.dyup.hestructureofthemaH<.pJang-^^^^^^^^^^ 
^..uieformatxHTMLfomiat provides a nea^^^^^^^^ 

„oaule20tope,formtheanalys,sus^ TheContentT^^^^^^^ ^^^^^^ ^^^^ 

30 using 3" party software such -J'^^;,^^;; ^,,,3 „p,,3tion of. he content tidying 
.«p:,«idy.scurceforge.net.. Assuc^nom*erd^^^^^^^ ^^^^^ ^^^^ 

module 18 will be given here. The Content liay.ng 

:ntent, the 00.0. Analy^s mod^e 20 then 

35 performs the following tasks in sequential order: 
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i) calculates the total and individual pixels and characters of display objects,in the 
web content; 

iO detects the functions of individual objects in the web content (normally they are 
the tags in a web page). For.example. an objectjnay possess.a styling, structural or display 
tag; 

iii) groups single objects based on their structural behaviour (infomiation from 
object tag); and then 

iv) matches, the display object display patterns, and groups them together to form a 
group (performed using a Pattern Matching algorithm). 

These four tasks are perfomied by respective dedicated algorithms, the details of 
which are given next. 

Concerning the first task, the purpose of this is to calculate the pixels/size of 
display objects such as text, image and etc. The algorithm which performs this task will first 
detect the type of display object. The algorithm then applies different analysis logic for 
different type of display objects. For example, if the display object is a text object, then it 
gets the length, font style and size and calculate the pixels based on these input. If the 
display object is an image/appleVobject. then the algorithm will calculate total pixels based 
on width and height of the object. For the rest of the display objects, the algorithm will 
calculate total pixels based on the width, height and/or width/height attributes set in the 
parameter of the object (if it is specified in.the HTML content). The exact steps performed 
by this algorithm are shown in Figure 3, and described next. 

Referring to Figure 3, the first step to be perfomied by the algorithm at step 3.2 is 
that it detects an individual display object within the tidied web content. Then, at step 3.4 an 
evaluation is made to detennine whether or not the detected display object is text. If this 
evaluation returns positive, such that the detected display object is determined to be text, 
then at step 3.6 the length of every text string within the display object is obtained. Next, at 
step 3.8 a text tag is created for every string, and at step 3.10 the numbers of characters in 
the string determined at step 3.6 is set as an attribute of the text tag. 

Next, at step 3.12 the font and style of every text string is determined, and then at 
I step 3.14 the size of ever/ text string is also determined. Using this information, at step 
3.16 the height and width of the text string based on their font, style, and size attributes is 
calculated and these calculated height and width values are set as further attributes of the 
text tag for each string at step 3.18. The process for that particular display object which 
was determined to be text then ends at step 3.50. and the process starts once again at step 
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3..a.ect*enexta,sp.3,o.e.,n*e«e.con.e..Oncea,,..ea,^^^^^ 

Returning to step 3.4, « it - ,,,er.ina whether the 

not text, then a second evaluation ,s perfo™^^ returns pcs.ve, i.e. 

detected dispiay object is an in.age, ap let, or « * ^tep 3,22 

that the display object is an incase, appiet, or ob e^,* n P^^^^^^ 

..erein a .rther evaiuation - Z^^^'. P-eedsto step 

ir^age, applet, or object ,s specked, if th, ,s the ^^^^.^ ^ .^^^^^ 

3.24. if this is not the case, then process,ng "'"^^^ '° J ^.ep 3.24. 

, ^dtho,theobiectisdetern.ined,andthereanerp— 

M step 3.24 a further evaluat,on ^P^*^' returns positive 

.eight ofthedetectedi^age, applet orobjea^^^^^^^^^^^ 

men processing proceeds to step 3.26. °" ^^ ^ ^^^^..^^.^ original height of the 
returns a nega«ve, then process,ng ^^^^^ ^l^ 3 3,.„ ...^ 3.2S. 
,5 objectis determined. ^^'^"■'''^l''ZT'J^l ^,,^e, . o.ie^ 

^ step 3.26, the ^ and he,ght ^/^^^^.^.L,,, attributes of 

determined by the previously describe^ ^^^^^ Zl^o..^^ ^ P^^'-^ ^ 

Object tag *in *e web content. « ^^^^ , ^ processed 

display object then ends at step 3.50. As oetor 

20 *-P^~-'"«,^7^^^^;;^n 'the'evaluation performed therein determines that the 
Returning to step 3.20, it tne evd k „._-essinq proceeds to step 

be repeated » required as described prevrously. ^^^^ 

« at step 3.32 it is instead determined that « 
spedfied, then processing proceeds to a f;:,! If this is the 

dltermlned Whether or notthe s^of ^e de^^^^^^^^ 

case, then processing proceeds '"^^P ^'^^ « proceeds to step 

s^,e attribute Of each control type forme b^^.^^^^^^^^ 

3.50 Where it ends, but may be -P ^^'^ ^ 3 frnai evaluation at 

33 step3.3ere.!l::=^^^^^ 
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specified and if so then the specified value is set as parameters in the style attribute erf each 
control type for the object at step 3.34. On the contrary, if no such value is specified then 
processing proceeds to step 3.40 wherein a default width and height of each control is 
retrieved from memory, which are then set as default values at step 3.34. 
. 5 This algorithm therefore acts to detemiine for each display object within the tidied 

web content size parameters such as the length of a text string, or the width and height of 
an image. This information may then be used in the adaptation process, to be described 
later. 

Concerning the second task, a further algorithm is provided to perform this-task, as 
10 described next. 

First the algorithm to perfonn the second task pre-defines the function categories of 
single objects from a' mari<-up language perspective. A Single Object (O) is an element 
embedded in a mari<-up language which carries properties of its own such as display styles, 
static or dynamic and structural styles. 
1 5 We define the following pre-defined categories: 

Information (I) 
Information Title (T) 
Control (Cj 
Decoration (D) 
20 Replaceable Navigator (RN) 

Un-replaceable Navigator (UN) 
Replaceable Navigator Title (RNT) 
Un-replaceable Navigator Title (UNT) 
The definitions of these function categories are as follows: 
25 Information (1) - an object that provides infomnative displayed content, which is 

important and cannot be replaceable. This object can be text, image, video, audio or any 
object (such as JAVA applet) file. 

Information Title (T) - an object that describes the infomiation object, which can be 
the text header or image with information properties. 
30 . Control (C) - an object that is meant for user interactive purposes, such as a button 

(radio or submit), input text area, form, drop down menu, check box. list box etc. 

Decoration (D) - an object that does not play an informative role but is solely for 
improving the effect of visualization. This object can be image or text. 
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Kep,aeea«e Nav,ator (RN, - a Navigator . a UR, « ^^^f^^^^^^^ 
H3v,gaJlsaNav,a.oro.lect«,a,canberap,a=edbyaUe™tiva.ext. U.u.bean, 

provided v»ith alternative text. Naviaator is a URI link objed. 

Un-replaceabia Navigator (UN) - as ^^"^l^^^, ^ ,p,aoed by 

^ Un-replaceable Navigator is tt,ere,cre a Navigator object that cannot 
a,ternat^,ete>d.ltn,lghtbetextori.age«.houtattem^^^^^^^ 

Replaceable Navigator Title (RNT) - A replaceable n^a^ 
UR, link Object Which descHbes a Navigator object, it can be replaced by aftem 
™stbe.n in,age provided With alte.^^^^^^^ 

Un-replaceable Navigator Title (UNT) H might be text or 

aescrlbesaNavlgatorobject. It cannot be replaced by alternate text. 

imagewithoutalternatlvetext. , categories, the algorithm starts a scanning 

By providing such pre-defined function categone ^ 
and con^paHng mechan.. that ana^ses the ^'^^'^^^ ^^^Z. a dec*n .. 
mark up language (such as HTML). The reasoning of the analysis is 
canning and Lparison logic sequence), as showr, in .gu. . 
The algorithm begins by scanning the is 
Whenthescann-.gprocessstarts,eve,ysingleo^^^^^^^^ P 
searched, detected and compared «ith the pre-defined ft-nction categon 
process is carried out until end ofthe mark-up language^ 

Within the scanning loop, the ^^^l^'^'' ^^J^^ stops comparing 
thelrfuncion based °" ^^^^^^ ^^^^'^^ ^txt^, J object (On.fl after the 
«,e single object (On) properties and searches for the next s,ng 
«rst Single Object on has .ua,ffied,orapartlcu^r..n™t^-^^ ^ ,^ ^ 

Referring to Figure 4, the decision tree process app y 
.ilows. The a^orimm starts by «rst searching ^ -^^^ttherthe d^ected object 
has been detected, the algorithm then checks at step 4.2 as 

object, "'t'^'^'^^''^"'*^*^'*;';*'^' 3,,^^^^ 
object as a Replaceable Navigator (RN) 46. 
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If there is no alternative text and title properties for this object, then the algijrithm 
categorises this object as an Un-replaceable Navigator Title (UNT) 52. Else if there is no 
alternative text and no title properties for this object, then the algorithm categorises this 
. object as an Un-replaceable Navigator (URN) 50. This distinction is evaluated at step 4.16. 
5 The title properties of RNT and UNT are based on the following conditions: 
It has title header properties; and 
It is a display object; and 
It must be URl hyperlink (image or text); and 
It has different styles compared to its adjacent display object. 
10 After comparing the object with the hyperlink properties at step 4.2. 4.4, 4.6. and 

4.16, if the object has not yet been categorised the invention will route the checking logic for 
non-hyperiink properties. User side interaction properties are the next to be compared. 
The factors that determine if the single object has user side interaction properties are if the 
single object is one of the following: button (radio or submit), input text area, form, drop 
15 down menu, check box. or list box, and an evaluation to this effect is made at step 4.8 

If the single object is detected at step 4.8 as having user side interaction 
■ properties, it will be categorized as a Control (C) 42. Else, the algorithm will further 
compare If it is an object which carries video, class object or audio properties, at step 4.10. 
If it is. then this single object will be included in the Information (1) function category 44. 
20 If the object has still not been categorised, the algorithm further checks the single 

object by determining if there are decoration properties earned by the single object at step 
4.12. The decoration properties are determined based on the following criteria: 

The size of the single object - The size of the single object is derived from an 
experimental value which best represents the size of decoration properties; or 
25 The presence of symbols, lines and separators between the present single object 

(Or)) and the next single object (00+-/). _ . 

The object size (width & height) is based on experimental value (subjective value). 
The inventors performed experimental tests on 100 web pages, and our results showed that 
images with pixel sizes width <= 20 and height <=20 tended to be a decoration object. 
30 If the present single object qualifies from the above conditions, it will be 

categorized as a Decoration (D) function 54. If there are no decoration properties found 
within the single object, the invention will further check for information title properties, at 
step 4.14. 



once the single obie<.lsae.e™ined as .ct.av,nsdeco.«onpr^^^^^^^^^ 

e- -o.e. as — n I^^^^^^^^ 
Single object will only be qualified as Information Title {\\)oo 
It has title header properties; and 
5 It is a display object; and 

!t might be text or image only; and . • 

It has different styles compare to its adjacent display object. 

• the -^le oSect Is determined not to have the title properties, .t will be 

categprized as an Information (1) ^-^^ f^' process and 

<Q Therefore, as will be apparent from the above, db 

..panno .echanisn, acne .is aigo*., a„ . 

dusters based on their positioning infomnation. Structural tags 

The structural tags we recogntee and seled are: ^p^^^^ . , 

<ADDRESS>, <BLOCKQUOTE>, <Hn>, <HR>, <CENTER>. MENU 

rchiri"- Clustering c.ects because the, are able to group objects 
aether Visually When the Objects are displayed on dlent ™ 

The ooeration of the algorithm which perfonns this tasK is simpie, 

•rirrjT « . 1^ ««. » »i- - 
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adapting the content, the multimedia objects that make up the page need to be groupf d into 
potential chunl^s. 

Yang and Zhang of Microsoft Research have described a system for locating such 
. content chunks, in Yang. Y. and Zhang, H J. ■ HTiVIL Page.Analysis Based on Visual Cues" 
5 In International Conference on Document Analysis and Recognition (ICDAR2001), 2001 . 
The following paragraphs outline a similar system. Both systems use the HTML tags to 
perform an initial grouping of multimedia objects into possible composite objects, followed 
by application of pattern matching to find possible further groupings. The difference 
between the systems lies in the distance measure used to determine the similarity of 
1 0 various objects and the algorithm for pattern matching. 

Initial grouping of objects 

Before performing the initial grouping of multimedia objects into possible composite 
objects, the HTML document is parsed into an xHTML tree to clean up the HTML tags and 
15 to form an easy to manipulate structure. The xHTML tree consists of HTML tags at the 
nodes and multimedia objects at the leaves. 

The next step involves the construction of a group tree, in which the leaves contain 
multimedia objects and nodes denote composite objects (and so potential content chunks), 
up to the top node which denotes the entire web-page. The xHTML tree is transfomned into 
20 a group tree, by first inserting <g> tags directly above a predefined set of HTML tags 
associated with the natural breaks in the content, mainly block level tags, such as <table>, 
<td>. <form>, <center> and <h>. Second, a set of tokens, one for each type of multimedia 
object is defined along with sets of attributes, for example, the number of characters in the 
text string, or the width and height of the image. Third, working from the multimedia objects 
25 at the leaves in the tree, the tokens are passed upwards and all nodes other than the leaf 
and those containing <g> tags are removed. As a token is passed upwards it accumulates 
attributes associated with the nodes, if a node has more than one child then all the children 
receive the attribute associated with it. Some fonnatting tags, such as <tr>. are ignored 
since they to not impose any attributes onto the multimedia elements and unlike the tags in 
30 the predefined set are not usually indicative of a new content chunk. If a <g> tag node has 
more than one child then the tokens an-anged in a linear list in the same left-to-right order In 
which the child nodes are arranged. 

By labelling the objects associated with various block-level tags, such as tables 
and ceils, as potential groups; the group tree already incorporates the majority of the 
35 composite objects and so content chunks. This technique assumes, not always correctly. 
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formatting ob^ots does not d«inguish ^^J^"''"'^^^^^ ,ee has been 
5 each <g> node. 

■ ''^'^'"".p.tbepatte.n— .P.ce^,sa^^^^^^ 

..ens in eao. ot tbe nodes are Cfcnt. Upt) 

,0 assoc,eted«i.h..Eacha«ribute=ons,s.o,2ean P^^^^^^^^ 

" '''^"":"t«otoKens,o.ana.w,«,tnesetso,att.b.es:(r^r) Z 
and (r'.F/). J'l.-."' the following similarity measureis used 

^ere i);;,K;Xr',.')=> - . , . .-uch that r = - k; 

20 ii,(r,.,,.)(r',.')->n(r,^/)Mr.^/)"-^--'-*'*^'^' 

K» and Vf are integers 

Hi) (^,^7,"K^^^'0=0 otherwise. 

30 variation in their length and composition. 
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Public boolean Compare (ArrayList A, ArrayList B) 7 

float M[][] = new float [A. size () +1] [B . size {) +1} ; 
float Allow = 0.55 // acceptable average gam per token 
5 float P = 0.3; // punishment for non-diagonal transitions 
for .(x=l;x<=A. size ();x++) 

for (y=l;y<=B.size() ;y++) 

10 if ((x-y)<=2 (x-y)>=-2) // if with 2 of diagonal 

M[x][y3 = Max(M[x-l][y-ll,M[x-l][y]-P,M[xHy-l]-P) 
+ S(A.get(x--l) ,B.get(y-l) ) ; 
} 

15 if\M[A.size()l[B.size()]>Min(A.size(),B.size{))* Allow) 
return true; 
else return false; 

Table 1 . JAVA code for the comparison of lists of tokens. 

To detect patterns, a lower triangular matrix, minus the diagonal elements, is first 
constructed detailing which of the child nodes (lists of tokens) are similar to one another. 
Next the significant token pattern, that is the repeated sequence of similar nodes that 
covers the largest number of child nodes, is found by examining all possible patterns. The . 
25 significant token pattern denotes the start of each new group. 

To prevent trivial significant token patterns emerging a number of constraints are applied, 
namely: 

The pattern must be at least two child nodes in length; and 

The pattem must be repeated at least twice; and 
30 Instances of the same pattern should not overlap. 

As these significant tokens denote the start of the groups (or content chunks), the 
groups are themselves extended by adding the following child-nodes into the groups whilst 
ensuring non-overiapping and reasonable similarity amongst the groups. 

The above concludes the four tasks performed by the content analysis module 20. 
35 After these tasks, the system will have constructed an XML tree based on the retrieved web 
page content. jThis tree includes vagous additional infom^ation about the web content, and 
is now in a fo'miat. suitable to be passed to the Adaptation Module. The information which 
has been provided by this process relate to: 

40 i) Unbreakable groups which are not supposed to be separated' during adaptation 

(i.e. the "chunks" referred to above); 



„, „e —3 Of group, an. s,n„e o.ects. wHch .ndicata ^^e^er tHay c.n be 
ignored or removed; characters of the content somce, which are used by 

now incudes iden^fiers (Known as <g> tags) " ^ as 

cbntent tor display on separate web " 

discussed eadler, based both on natural breaKs ,n ^ ^ 

<,abie>,<td>.<.on.>,et=) and also based on ^PP^P^*; ^ ; ^.^^ 3 number , 
,a dispiayedtogether.T.ehlerarch,ca,tree.™..^^^^^^^^^^^^ 

of nodes indicating where the tree may be spirt, and there _ 
Closest to the leaves, indicate those nodes below >*;ch -n sp.« 9 
.hreaKabiegroupsofcont^whi^ca^^^^^^^^^^^^ 

An additional aspect of the XML ^eejt ^.^ 

20 These ident^ers are labels as^cated e heM^b n* . ^ 
n,ultimedla elements such as an image r ^^.^'^l ,,jects which are similar, for 
obleols which perform certain funct,cns '^^'^^^ -^l^^ ^ other cha^cteristics 
e^Lple of the same type and ^th similar lists 0 ^ ~ "^J^^^, ^. 
(such as image s'.e or number of text ^^-^^^^\^^'^^ ^ ensure unffonn 

-r;oftr:=::-=--^^^^^^ 

ralls,m,,arlmagesae.bo.esarereducedby.esam^^ 

AS mentioned, the Content Analyse /" "J^ „<^„,e te then 

analysis ( the XMU tree) to the Adaptafon ™ '^J^' ^^^^^ of 
30 retHeves all the .lent capabil^y ^^j^^eJ ™^ 



capability profile. 
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The algorithms that operate on the XML tree to adapt the content are illustrated in 
the flow charts in Figures 5 and 6. Figure 5 shows the algorithm which checks whether the 
content will fit Into the current profile range for the display device. This algorithm cycles 
- • through- a- number of stages, checking each time whether the content will fit the page (ie the 
5 display device), and if not then it performs a number of transformations to progressively 
reduce the size of the content. The algorithm of Figure 5 forms just one of the stages of the 
algorithm shown in Figure 6, which further operates to split the content onto different pages 
(if necessary). 

Starting with the algorithm of Figure 6. the purpose of this is to ensure that if the 
10 content is too large for display on a single page on the intended display device, then it is 
split into multiple smaller pages, whilst minimizing the amount of white space on the pages. 
To perform the splitting, the algorithm uses the <g> tags in the XML tree structure. The 
algorithm operates by moving through the tree from node to node (the nodes tagged with a 
<g> label) starting from the top <g> node. At each node, the algorithm calculates whether 
15 the content in the sub-trees below it will fit the display, applying transformations to the 
multimedia objects (such as shrinking the font size / image size) as appropriate. If the 
content below the current <g> node will not fit the display, then the algorithm moves down 
to the child <g> nodes and recalculates for each of those (splitting them further if 
necessary) until the whole web page has been output as multiple smaller web pages. 
20 In more detail, with reference to Figure 6, the steps carried out by the algorithm are 

as follows. Two temporary stores, stacks "Q" and "T". are used to temporarily store the 
nodes during processing. A third store, an-ay "Trans", is used to store data relating to the 
transformations (such as reduction of the font size, or image size) which are to be applied to 
the various objects in the web page. 
25 The adaptation algorithm commences at step S6.1. Firstly, the algorithm takes the 

top <g> node from the tree and puts it into stack Q (step S6.2). The algorithm also ensures 
that the temporary stack T is cleared (step S6.3). The algorithm then moves to step S6.4, 
which calls the function ms^age(T,Trans), explained in more detail below, to check 
whether the <g> nodes currently stored in T will fit into the client display, and applies 
30 transformations to the multimedia objects as appropriate. HQAfi/ever, 'since at this point the 
stack T is empty, fits _page(TJrans) will return TRUE, and processing moves onto step 

56.5, which is to check whether all the nodes in T have the same parent, using the function 
are^siblingsCn. Since T is also empty, this returns TRUE, moving processing onto step 

56.6, which checks whether there are any nodes remaining in stack Q. Since there are 
35 nodes in Q (the top <g> node is currently in Q), processing moves onto step S6.7 which 
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« irto the *n, device display, « ^PP^"^— ^''^ „ stage, the 

appropriate. This step involves the algon«,n, — /J , ,,,3, device 

aigorithm is checking whether the entire web -9 J* - ' ,0 
(^.sethe node currently stored inTIs he top nod^. .^^^^^^^ 

„, ^ aigorithn, will terminate by passing ^l^'^^^^,^ , entire 

nodes remaining In stack Q, the process jumps to step Sai5 wh 
,ee(uslngaca.l,othemnction.ndenfr,rran.H^^^^^^^^ 

However, when the entire web page o-'^"**^ ™ ^^^^^ top node 

^S6.4 Will be naga«.e, and process,ng moves oM^^^ 

- -ckT and ---- ^w^r ir^r the node N Is a ieat. At this 
so processing moves to step S6,10 which 

stage, the answer Is negaUve (because the ^ ,s thej P J ^ ^ 

_tostepSe.t,wh.hp..s.^^^^ 

rirstr;re:e:cr<g>n.eso,thetopx.^ 

.0 P^cesslngcycleeba^tostepsaawhichciearsthetem^^^^^^^^^^ 

processing then moves thro^h ^^^ ^'^ ^ web 
flrst of the direct Child nodes from stac^ QJ^^^s^^^^^^^ 
content (le«,e content in the sub-treesb^ow^^^^^^ 

dieot display (le step S6.4 returns TRUE), in th,s cas p ^ 
as stepsSeASa.5,Sa.6andS6.7toaddthese^d^^^^^^ 

p^^ess v.,i repeat -^'"^^^^^^'^^^XZ It the client display, if at any point 
content (le of ail the nodes added to s^ack T) st ^, „^^3 T) 

tne source content (ie the current portion o, c nten ep ese^ V ^^^^ 
not « me display fe step S6.4 is negative) ^^^^^^^^^^ ,^a< T 

30 the last node added to ^ck T (,e 2::^^:::^::^: remaining on 

and stored as node N. Step S6.9 then che ^ ^^^^^^^ 

stack T, and. If yes, processing moves to step S6.13 v*e ^^^^ ^.^ 

.ore . came from (ie stack Q, for P~ . adapted 

.„,ent r T^e a.or«hm conunues w«h 

35 smaller pages using a call to me .u 
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S6.12 determining whetlier there are stili nodes remaining in stacl< Q for processing.-^and. if 
yes, moves to step S6.3. 

Following the steps Illustrated in Figure 6. the algorithm will iteratively work through 
■• the entire tree, transfomiing the content to see if -it can be made to fit the display (ie in 
5 function msjiage(TJrans)) and if not, then splitting the content even further. This is 
achieved in the algorithm by the process of: each time the content below a node does not fit 
• into the display device then its direct child <g> nodes are separately added into a store (the 
queue in stack Q) in step S6.11 for processing later. This acts to split the content Into 
smaller and smaller portions, so that the sub-trees can then be checked individually to se^ if 
10 the content is a suitable size for display on the client device. For .any content portions that 
do fit the display, an attempt is then made to combine them with other portions of the 
content (ie step S6.7) so as to minimize the amount of white space in the display. However, 
this combining of different portions of the content is only made if the nodes in question are 
siblings (i.e. have the same parent <g> node in the tree). In this way, the adaptation 
15 algorithm works its way down through the entire tree, from the top <g> node to the bottom 
<g> node leaves (below which no further content splitting is allowed). • 

By virtue of hierarchical tree anrangement of the web content in the XML tree, and 
the order of the steps performed by the adaptation algorithm, the top-to-bottom, left-to-right 
order of the content is maintained when it is displayed. Also, it ensures that only whole and 
20 related composite objects (of the same level of grouping) are displayed on a single page. 

In addition, the algorithm ensures consistency of display of simlar objects 
throughout the smaller pages. It achieves this by the use. of a store (eg the array called 
Trans) which maintains a list of transformations which have been applied to the objects, 
together with the object label associated with that object (ie the second type of identifier tag 
25 referred to eariier which indicates similar objects). For each portion of content, the 
algorithm checks an-ay Trans to see if changes have already been applied to objects which 
were similar, so as to ensure that the transfomiations applied to these objects are . 
consistent The array Trans is dynamically updated during processing, each time the 
function fits _page() successfully applies new transformations to a portion of content that 
30 result in it fittirig within the page size. 

It is to be understood that throughout the description of the adaptation algorithm 
above, reference has been made to two stores, stacks "Q" and "T", used to temporarily 
store the nodes during processing. For the purposes of the function of the algorithm, such 
stbres can therefore be considered to hold the relevant portions of content associated with 
35 those nodes. This is true notwithstanding the fact that the stores themselves may actually 
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^ ,.p,e.en.ed sln,p^ as a «s. of n,e.o,y addresses wWch po,n, to *e .oca«ons^f the 
content in another part of the memory. 

An exemplary embodiment of the adaptation algonthm (pseudo java 

adapting the web page is given below: 
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splitPage (top_node) 
{ 

Stack Q; 



// 0 is a stack which holds the nodes to be processed 
^ / / T is a stack which holds nodes being processed 

10 fodf.;' // ?;;:l™ywM:hholdsthetra^^^^^^^ 

■ Q.push(top_node) // Add the top <g>. node in the XML tree to stack Q 

15 while (Q.size()>0) 

ihile (.T.emptyO) T.popO; // EmptystackT 

while (fitsjpage(T,Trans) are_siblings (T) ) 

20 ^ . ,^ ns rrot-o end; // Finished XML tree 

if (Q.size()==0) goto end, // ^j^^^^^p^ode off stack Q... 

N=Q . pop 0 ; / / and add it onto stack T 

T.push(N); " "' 

25 // RemovetopnodefromstackTandstoreitasnodeK 
N=T.pop(); 

. // Checks whether there are any nodes in stack T 

if (T.size()>0) I' 

^ // nodeNisretumedtostackQforprocessinglater 
'° ?;r3=render(T,Trans); // output a section of the XML page 
} 

else 

< ^»r- fN Trans) ; // If the currentnodeN is aleat 

35 if (is_leaf(N)) Trans=render (N, Tr^nsJ^^^/^/ ^^^^^^^ 

else 

ior each child C of K ----^J^^^^f^, direct child <g> nodes of N 
40 {Q.push(C);} // onto stack Q for processing 

} 

} 

> , // output the last part of the XML page 

45 end: render (T, Trans) ; 
> 
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where the following functions are used by SplitPageQ: 

Is leaf m 

5 . This function returns TRUE if the temporary node N is a leaf 

This function returns TRUE the nodes in stack T share the same parent <g> node 
(or T contains a single node or is empty) 

10 

fits oaaefT.Trans) , x.. 
This function delemiines whether the nodes in stack T. (le the content of all the sub- 
trees below those nodes) are srr,all enough to fit into the display device. Rrstly, stack T 
contains any nodes wHh identifier labels the same as those In Trans, then the assoaa^ed 
15 transfonnatlons are applied to those objects. If the content now flts the page then 
ffte_pagefT, Trans) returns TRUE. , .m^^ 
However, If the content tloes not yet fit the page, then a number of dtfferent 
transformations can be applied to the content, and these are Illustrated in Figure 5^ If any 
these additional object transformations successfully result in the content fitUng >he d.spUy 
20 device then the transformations are added to the airay Trans ^Mth the app-opnate object 
labels, and fftsjsagefr.TransJ returns TRUE. 

Alternatively, if the source content will still not fit the display device, even after all 
possible additional transfomiatlons have been applied, then msjag^CT.Trans) will return 
FALSE. 

25 This function will also return TRUE if stack T Is empty. 

This function is used to output a portion of the content (le the content represented 
by one or more nodes of the XML tree) as one of the adapted smaller pages. The funcbon 
30 either has input parameters of node N.nd array Trans, or stack T and array Trans. Since 
Trans contains the identifier labels for object which have been rendered to date, along «^h 
those transfom,atlons and parameters associated with them, then if N or T contains objec^ 
whose labels are the same as those in Trans, the relevant transformations are appl^ to 
those objects. The remaining objects are then transformed as appropriate (le to fit the 
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Wenoed d^play device «*t the white space, and the iabei, and 

Won^ations added toTrans.Hna„yTrans is returned by thefuncon. 

^ . ^. .en«oned eadier, dudng the — caii ^.^aaen", r^^^ ^ 
.oes not « the prc«ie ran.e <e. the intended display ^^^^^^^^ 
,ans,orma.ions can he applied to the content, and these -^J^J^^^^ 
^,h reference to Figure 5. The aigorithm wh,ch --jj^J^^i:*^ the 
checks: i) Check if the total pixels and characters of the source content 
, ^^^''Toheck.anerre.oving.hehiankspacesandilnes.sourcecontentcanhef.lnto 

•^^^^Tc"" ren,ov,n. resign. su..a..ln. and hanging properties of the 

-'^^rse~r::r::;rssr^^^^^^^^^ 

. areapp.,rorder,h.anereachtransforn,.asheen^^^^^^^^^^^^ - 
.0 determined « the content as transfonned up to ^ ^ ^ „ -.t is detennlned ; 

— zr^'""— — ■■ ■ ; 

transformations are applied. 

size with -Verdana" as the font-fam.ly. ^^^^ 
2"^ transformation: Image reduction. The purpose is to reduce .mage j 
goes into recursion until it reaches the optimum size or 50Vo 

, Th. nnmose is to get rid of those unnecessary space 
4'^ transformation: Space removal. The purpose is to gei r 

6^ transfomiation. Decoration image removal. The purpose 

have decoration properties based on objects' s^e^ ^^^^^^ 

T-^ transfomiation: Decoration text removal. The purpose 

which act as decoration if they are special characters. 
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algonthm will compare the alternate text size ^.h the image ,tse«. The shorter w,, 

Cirrr^e — a«on a,go.thm in more aeta,,, .a..n part..^- 
5 iiiustrates the eight a«erent— s which ^ay he applied. Re.enin« to Bg.e 5^^ 
procedure provided thereby is started at step 5,1 wherein t»o counters ^'-^^'^ 
particulariy, a f,rst counter i is inftialised to i=1 , and a second counter r ,s '"itiaiised to ra 
' Next, processing proceeds to step 5.2, wherein an evalua^on - 
determine whether or net the web content will m Into the display of the rtended d spl^ 
,0 dr. ™s evaluation is pe^ormed by comp3.ng the chara.^^^^^^ < 
«,e Client devic. display capability characteristics as provded ,n the ^''"^^'^ 
Troffles in the prof,ie sen/er 26. To generate a particular adapted version, at step W the 
11 : is alays perfonned against a single one ot the *nt -a* ^^^^^^^^^^^ 
respect of which an adapted version Is being generated by the present >nstant,at,on 

" """";:r,l=nats.ep5.2indlcatesthatthewebcontentcan«in.othed^^^^ 
tt,e intended display device, and no transformation is required, then the trans,ormat,on 
algorithm ends at step 5.3 and return TRUE for the function mjageO 

on the contrary, if the evaluation at step 5.2 returns a negative, then process ng 

started which takes the forms of steps 5.8 and 5.10. 
,S " ep 5.5 the font size for al, text in the web content to be adapted ,s s^ a . 
■ although in other embodiment other values may be chosen. Next, at s^p 5.10 J^n^ 
ZL for al, text in the web content Is set as -verdana-. These ^^P^ « ^^^^^^^ 

:Loa,„ --v: ttc::: ^^^^^^ - 
. 30 atstep5.^ 

" odrn^nLhetherornotthetransformedwebcontentwIllnow™^ 
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.nega..e.esO,t is returned ana hence proees^^^^^ ^ ^„ ^ 

^,ch evaluates whether me counter ns equal ^ 

.etumea, thereupon processing w«i ^'^^'^J^^^^ commenced. Within the image 
5 A, step 5,16 an in.age reduction ' ^.^^on of the image is 

.«iuctiohtrans,orma«on, flrst at step 5.8 a n.— ^.20 an evaluation Is 

oMalned. This isahard coded value, .or exarn^^^^^^^^^^^ 
„3deastowhetherornotthemax,,r,un,reduct,onv^u^^^ 

....eccunterr. - ^^''^ ;;;:;;^llttL e— n o, step 5..0 - 
,0 inltiallseda.tozero,andhanceonthe,,r t e u^„nth ^^^^^ ^^^^^^ ^^^^ 

' posftive value. Here processing proceeds to s^^^ ^ ^^^^^ ^ , 

content are reduced by 10%, ^"^"'^^^ ''l'^^,^, I evaluation is made as to 
, incremented by one, and .cm ^ ^^^ZTZ^^ been incremented at step 
«..ther or not r is equal to 5, On the ^ ^„ ,etum a negative :, 

,5 5,24 to ta.e me value 1 only, and ^^IlZZ^,^ 5.2, wherein me evaluation as. , 
..,ue. inmis case processing prcceeas^m^V a Mo P ^^^^^^^^^^^ 

,„ -«.emer or not the transforr^ed ,„as.. step 5.3 aimough r,t Is not 

aevlcelsundertaKen-lfthisisthecasemenP",^^^^^^^ 

thecaseihen processing prccaeasv,astep5^4tostep ^ 

,0 been incremented / -::/r td s^^^^^^^^^ 

transfom-ation of steps 5,18, 5.20, 5^22, 5^24 ,^„3,„^aaon can be appLed 

Hwill be seen from Rgure 5 that the image redu ^^.^ 
„ptor«e»mes,andeach«meltlsap,iedme,^^^^^^^^^ 

,or each recursion, or by me max,mum J ^,,,„,,„«,,3 me present 

25 maxlmumreductionavailabieo.the,mage,sno e^^e^^^^^^^ 

. , ,„ efther event, however, the —f"" . pos^ive, whereupon 
counter r =5, In th,s.oase, me -^'"^""^ ^/j^,,, , ...cremented. From step 5.12 
processing will proceed to ^ep 5.12, -^-^^^""^^^ ^s to whether or not 

processing always proceeds bacK to ^P^^' ^Ted display device Is undertaicen, Kthe 
30 Lcontentwlli "ow « Into the display dth -nten^^^^^^^^^^^^^^^ ^^^^^^ ^ 

«„.,o^atlons already ^^-^'f ^^^j" * — atlons already applied are not 
value and processing will end a ^^^^^ ^„ processing will proceed 
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Here, an evaluation is made as to wlietlier or not tine counter i equals 3. ar^ if so 
processing proceeds to step 5.32, wlierein the control object reduction transformation is 
commenced by proceeding to step 5.34. 

At step 5,34- a ratio is- obtained of the default screen size for the web content, and 
5 the actual screen size of the intended display device. Based on this ratio, at step 5.36 a 
size of each control object is calculated based on the ratio, by applying the ratio to the 
default size. Then, at step 5.38 an evaluation is performed at to whether or not the 
calculated size for each control object is less than the minimum allowable size for each 
. object, and if not processing proceeds to step 5.42 wherein the control object sizes can be ' 
10 reduced based on the calculated ratio. If. however. -the calculated size is less than the 
allowable minimum size of each control object, then processing proceeds to step 5.40. 
wherein the size of the control objects in the web content is reduced based on the allowable 
minimum size. The allowable minimum size is predetemriined in advance. 

After either step 5.40 or step 5.42, processing proceeds to step 5.12 wherein the 
15 counter i is incremented, and thereafter to step 5.2 wherein the evaluation as to whether or 
not the transformed content will now fit into the display of the intended display device is 
perfomied. 

Assuming the evaluation of step 5.2 returns a negative, the counter i is now equal 
to the value 4, and hence processing proceeds via step 5.4. step 5.14. and step 5.30. to 
20 step 5.44 wherein the evaluation that i equals 4 returns a positive value. This has the result 
of causing processing to proceed to step 5.46. wherein the space removal transformation is 
commenced. 

This transformation relates to Iboking at object tags within the web content, and 
removing those objects which have particular tags and/or which meet other certain 
25 conditions. Therefore, at step 5.48. those objects which have tag <BR> and which are the 
first child and the last child of objects with tags <TD> and <DIV> are removed. Next, at step 
5.50. those objects with tag <BR> and which are the sibling of objects with tag <Table> are 
also removed, and then, at step 5.52 any continuous blank spaces within the web content 
display objects are reduced to a single space, and correspondingly, at step 5.54 any 
30 continuous break^within the web content display objects are reduced to one. Finally, at 
step 5.56 the cell padding, and cell spacing values of any <table> objects are reduced to 
zero. The result of the space removal transformation is to reduce blank space in the web 
content to an absolute minimum. 

Following step 5.56 processing proceeds to step 5.12. wherein the counter i is 
35 incremented to 5= The evaluation at step 5.2 is then performed to determine whether or not 
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would ralurn a positive. Tt^ereafter at step 5.60 the ^"^ '^^ 
, ^ict,actsatstep.62tore™veai,aisp,ayob,e<.s2^^^^^^ ^^^^^ ^^^^ 

the same mnotlon as the fourth transformat,on previously applied, 

,.=H= to steo 5 12 onde again, wherein the counter i 

been incremented to 6. „o<«.sing proceeds to step 5.66, ^«herein the 

Therefore, following step 5.64 processing pro ^ 

incremented, and thereafter to the — ""f*^^;^,'^ ,,3p,3, device. Assuming 
.ansfon^ed content will n.w « into ^^J^^l^^^^^^l^l^l^ of steps 5.4, 5.14, 
20 this is not the case, processing proceeds by the es^a ^^^^^ ^ ^ 

, 5.30, 5.44, 5,53, and 5.64 to step 5.70, and "J , ^^reln the 

p.si«ve result is refunded. This causes ^^^^^^J^^^ ,^^ ,,03e Images or 
decoration image remove .ransfom,at,on ,s ^'^'^^"^'J^^^^ 3, ,,eoratlon ■ 

rmrr=:::— ^ 
"°"^^"'=r:eSr-= 

counter i is incremented. Thereatter me Intended display device, 

or no. the now transformed content - m into the d.p^V of e n e 
3ssum,ng that this ev—^^an..^^^^^^^^^ 

respective evaluations of step 5.4. step 5.14. P g 
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alternative text. If this is the case, then processing proceeds to step 5.82 wherein a-4urther 
evaluation is perfonned as to whether or not the total pixel size of the alternative text to the 
image is smaller than the image itself. Only if this is the case will the image be replaced 
with the alternative text. - There is clearly little point in replacing an image with alternative 

5 text, if that text will tai<e up more space than the Image. Following the replacement at step 
5.84 processing proceeds to step 5.12. Similarly, if either of the evaluations of step 5.80 or 
step 5.82 return a negative value i.e. an image does not have alternative text, or the 
alternative text is not smaller than the existing image size, then processing similarly 
proceeds to 5.12. It should be pointed out here that the image transformation depicted in 

10 Figure 5 is applied to each image in turn, before processing proceeds to step 5.12 and the 
counter i is incremented. Moreover, this processing of multiple objects in the web content 
applies to each of the transformations previously described, in that each transfomiation is 
applied to every relevant object in the web content before the counter i allowing the next 
transformation to be applied is incremented. 

15 At step 5,12, once again the counter i is incremented, such that in this case it now 

takes the value 9. Therefore, when processing proceeds to the evaluation at step 5.2, the 
altemative condition of that evaluation that i is greater than 8 is now met, and hence the 
transfonfnation .algorithm must therefore end. 

It will therefore be seen from the above description that the transformation 

20 algorithm acts to apply each transformation to the display objects in the web content in turn, 
and evaluate after the application of each transformation as to whether or not the 
transformed web content is capable of being displayed on the display of the intended 
display device. If this is the case then no further transfonnatlons are applied, and the 
function returns TRUE to fits_pageO. 

25 

After the adaptation process is done, ie after the adaptation algorithm of Figure 6 
has ended, a different version of web content will have been generated for a particular client 
capability profile. As there are a plurality of client capability profiles, however, the algorithms 
must be run repeatedly to generate an adapted web content version for each client 

30 capability prqfile. -Following this (i.e. aftep all the versions have been created) the 
Adaptation module creates the profile Ids for the versions created based on the client 
capability profiles, and stores the adapted versions and Ids in the Content Cache. The 
logical relationship of the profile ids and physical adapted content is in the form of a 
database structure cross reference link. These versions of adapted content are then ready 

35 to be retrieved and delivered to an end user upon request. 
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Thus in the above described mode of operation, the system --*"3f 

display device with known and spec5f.ed-charactenst,cs. - • 

AS mentioned previously, however, the system may also operate .n we 

capacity. This n.ode of operation will be descnbe^^^^^^ 

imagine that the system IS acting as a web server ana 

• «n httD reauest for web content from an end user device 1. That ntip 
The server receives-an http request tor weo detemiine 

standards suet, as those put fonvart by M31 (ptease refe ^^^^ 
Oe.ce independent. MarK H BU«er, HP Z ^..standing ot, 

S referenced previously, the relevant conten^ ^, present time most: 

the present invention being incorporated here,n ^efe'-"-'^ Z 'wpe and version,,. 
«eL. browsers contain end W device informa on s^^^^^^^ 

,P .adress, saeen resolution etc in the ini«al request o = ^ 

aevice Will star, communicating wHh the server '"^" '^^^^^^^l^,,^ ,,,ue 12 
Tn net the end-user device information. Client capaDuityara 

JO web browser. To get the ena u infonnation sent from the end- 

uses a Simple Javascript- to retrieve ' ^ "^^^^^^^^^ 3 ^3,3 3,^et program. 
„se.s browser and passes the '^^'^^^J^J^o L w^lch get and post end-user 
There follows below a sample of the Javascnpt prog 
Tel .nforma«on .0 the sen,er through a .ava* Serviet called -clientprofiie . 

25 

<script language="JavaScript"> 

function getdeviceinfoO i . value = document . lastModif^ed ; 

document. formclient.pageUpdate ^ screen. availHeigbt ; 

document. formcLient.avaxlHexght^^u , ^^^ii^idth ; 

30 document.formclient.avaxl^xdth^value ^ ^^^^^^.^^^ff ^rDepth; 

3° document.£ormclient.buf£erDepU.^va^^ ^ screen. coloxDeptlx ; 

r^:ft:rorcii:f.:™'^^^^^^^ 
35 ^-^--rc^rt^^^c^^^^^^^^^ 

T-ZlVo^^lZ^^^^ screen.updatexnterva. 
document..ormc..ent.,a.^^a..e^^-^^^^^^^ 

rcreftlforciiS-.SpvSsion.value = navigator . appVersion; 
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} 

15 </script> 



document. formclient.cookieEnabled. value = navigator . cookieEnabJed ; 

document. formclient.cpuClass. value = navigator .cpuClasa ; 

document. formclient.mimeTypes. value = navigator . mimeTypes ; 

document. formolient.appCodeName. value « navigator . appCodeName ; 
• document, formclient. platform, value = navigator .platform ; 

•document .-formcli^nt.-opsProfile. value = navigator .opsProfile ;• 

document. formdient.plugins. value = navigator . plugins ; 
■ document. formclient. systemLanguage. value ^navigator . systemLanguage ; 

document. formclient.userAgent. value = navigator .userAgent ; 

document, formdient.userlanguage. value = navigator .us erLanguage ; 

document. formclient.userProfile. value = navigator . userProfile ; 

document . formclient . action* "clientprof ile" ; 

document . formclient . si^mi t ( ) ; 



Having determined the set of client characteristics of the end user device 1. the 
client capability discovery module 12 then passes the set of characteristics determined 
thereby to the decision module 14, v/hich acts to compare the end user device 
20 characteristics with the set of client capability profiles stored in the profile server 26. If the 
decision module 14 can match the set of end user device characteristics with one of the 
client capability profiles, then the decision module accesses the content cache 10 which 
stores the different versions of adapted content using the profile ID of the client capability 
profile which matched to the end user device characteristics as an index thereto. The 
25 adapted version of the web content which is indexed to the profile ID of the matching client 
capability profile is then retrieved from the content cache 10. and supplied via the network 
to the end user device 1. Thus, in this mode of operation, the system is able to match end 
user device display characteristics with a set of predefined device characteristics, so as to 
determine the appropriate pregenerated adapted version of the web content to send to the 
30 end user device. 

As previously mentioned above, the system also provides a further mode of 
operation, which combines the operations provided by the previously described modes. 
Here, when an end user device 1 makes a request for web content, as before the client 
capability discovery module 12 acts to detemnine the display characteristics thereof, which 
35 are then passed to the decision module 14. The decision module 14 then attempts to 
match the capabilities of the end user device 1 with the client capabirny stored in the profile 
sen/er 26, and if a match Is found then the appropriate adapted version of the web content 
is retrieved from the content cache 10, and passed to the end user device 1 over the 
network. If. however, no match can be made, then the decision module 14 acts to operate 
40 the adaptation module 16, by passing the details of the end user device 1 relating to the 
characteristics as determined by the client capability discovery module 12 to the adaptation 
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...e X.e .ap..o. .0.. ..en c..^ -^-^^^C:!" ^ 
scored in .he prof„e se^er 23 corresponding ^"^^^T^^ZZ, ,3,,,,,, ..en pre- 
and also s^rts Its operation in e«=«v the --—^^^^^ a nL adapted version of 
•geheratinVadapfedversionsotthewebcontentsoastoJ^^^^^ ^at is, the adaptation 

,e wet. content adapted speCcaiiy tor - ^ I '^^ich anaivses the we. 
„odu,e 16 oauses the oontent analy . ^^J" ^ as to generate a 

content, allo^ng the adaptation n^odule to ^^^^ ..e-adapted 

new adapted version of the web oontent speo,f,calvfor*en^u^^^^ 

web content is then fed back to the deo,s,on r^odule, wh.oh fo^^ ^^^^^ 
, theenduserdevice1.inaddi«onthenewada^dweb»^^^^^^^^^ ^. ,^ ^^^^^ 

oache 10 for Mure use by simiiar end user ° ,e aeated 

,He customisation module 24. This is mereh, a f^or*e d o a,^ ^^^^ 
various adapted versions 0, the web content ^^J^ of this funcUonaiity, no 
ft^rther ref,nements or improvements thereto "^'^^ J^ 
furtherdiscussion Of the customisation moduie4-be>^d^^^^^^^^ 

,n conclusion, therefore, the system aiiows ^^'^^^'^ requests can then be 
be created in advance of user request, ther^or ^^^ZZ^^ ,,„eated 
3e„ioed b, matching the display f ,,,,, and w^h very ,«t.e 

versions, and hence ailowing a response to ^, , center* can be 

computing intensity. AddKionaiiy, « require. ^^^^^ ^ „„,ent, and 

::po:rr=r=^-^^^^^^ 
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but not limited to". 
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the content and transforming it as necessary to reduce it in size to fit the display (Jevice. 
However, an alternative embodiment is also envisaged in which the procedure would be 
modified so as to initially reduce the whole content down to its smallest possible size, before 
splitting-the- content and then transforming byTescaling upwards again to increase the size 
of the content to fit the display size. Upon rescaling the content upwards, it might then be 
necessary to re-split the content so as to better distribute it between pages. This procedure 
would therefore also involve iteratively splitting and transforming, but the step of 
determining whether the size of the content is suitable for display on the device might 
involve calculating whether an unacceptably large area of white space would be displayed 
on the device. 
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CLAIMS 



aev-,ce, comprising adaptation .eans for sp «.^^^^^^^ 

rrnrrr-et.. t.e o, t. conte. P0*n . 

.pp,V3t,eastonecon.e..an— on^e^^^^^^^^^^^^ , .e 

(iv) analyse the transformed content xo 

— rrr:— ^^^^^^^^ 

. ..cels;:rc:tentpon,on.o3p..V...*e.contentpo.o.. 

, .pparat. aoco*. to Ca. . 3^.^ .ePMO ana 0. to .t..^^^^ 

Lher the is suitable comprise determin,ng whether the oonte 
display on said device. 

" 3 Apparatus accor.n. to .ai. . or ^. wherein the adaptation means is turther 
'"'"'TSZ -p OV) determines that the transformed content por«on is smai, 

--^rrr,::': : 1::;:!^ ..h . ^ conte. potion to . 

combined content portion; 



portion. 
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5. Apparatus according to claim 4 wiien dependent on claim 3, wlierein the adaptation 
means comprising a store for content portions, and wherein said step of combining two 
content portions comprises selecting the further content portion from the store, the 
adaptation means being further arranged to: - ■ 
5 analyse the content to determine whether the size of the transformed combined 

content portion is suitable for display on said device, and if the size of the transformed 
combined content portion is too large for said device then break up said combined content, 
portion so as to return the further content portion back into said store. 

10 6. Apparatus according to claim 5, wherein the adaptation means is further arranged 

to: 

if the size of the transformed combined content portion is small enough for display 
on said device then combine it with a second content portion. 

15 7. Apparatus according to any preceding claim, fuijther comprising analysis means 
an-anged in use to translate the web page content into a hierarchical tree format comprising 
a plurality of nodes labelled so as to represent suitable locations for splitting the content into 
smaller web pages. 

20 8. Apparatus according to any preceding claim, wherein the adaptation means further 
comprises a store for content portions, and wherein said steps of splitting content to form 
smaller content portions comprises adding a plurality of content portions into the store. 

9. Apparatus according to any preceding claim, wherein the adaptation means 
25 comprises: 

a transformations store for storing a record of transformations which have been 
applied to content together with an indication of the type of content those transformations 

have been applied to. 

30 10. Apparatus according to claim 9 when dependent upon claim 3, wherein the step of 
combining the content portions as defined in claim 3 further comprises: 

applying content transformations according to the record of transformations to the 
further content portion so as to consistently apply transformations to the same type of 
content as indicated in the record of transfomnations. 

35 
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comprising splitting the content into a plurahty of smaller web p 

-^'t:r;":r.::r^er .e o, t. ..e. po.on is 

— :Sron:":ntportionisnots.a.e,or..^^^^ ■ 

.pp,tn.at,eastoneoon.enttran«^^^^^ ^ 3.e . .e • 

(iv) analysing the transformed content ^° 
..nston^ea content po.ion,ssu.a.,Mora.pi^^ 

(vi) „ tt^e Size o, the transfon^ea ^'^^^^i^'^^^' P"'"""'' 
aevice then splitting the content porton into a pluraUty 

display on said device. 

..ethoaaccoraingto.ain.1.or...~ 
20 in the event that step (iv) aetenn.nes that the translo 

combined content portion; 

U. A-thoaaocoraingtodalmiatu^erc^^^^^^^^^^^ 

analysing the content to aeten^ne ^^^J^ ^^^^^^^ ^„ is too 

t:::r::= r — on to .e co..nea 

content portion. 

and further comprising the steps of.. ^^,t,„ tt,e size of the transfonnea combinea 

analysing the content to ^/^^ , ^ ..a of the transfon^ea 

35 content portion is suRable for aisplay on sa,d devc. 
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combined contsm po*n too ,a^e for sa« deVoe men breaking up said combined 
content portion and returning tt^e furtt«r content portion back into said store. 

16 AmethodaccordingtoclalmlS.further-oomprisingthestepof.. ^ '',.,„ ' 

5 Z Size 0, the transfonned combined content portion is sntai. enough for d,sp,ay 

on said device then combining H with a second content portion. 



17 A method according to any of daims 11 to 16, further comprising the step of 

?ar^«ngthev^pagecontentintoahierarchicaitreefom.atcompns,ngapiu2 
1C Of nod JlJd so as to represent su«ab,e locations for sp«ng the content into smaller 
web pages. 

13 A method according to any of daims 1 1 to 17, wherein said steps of splMing o,n.e"' 
1,00.11 content portions compdsesaddingapiurar^ Of contem 

19 A method according to any of claims 11 to 18, further comprising: 
' mLlnlng a record of transformations which have been appHed to «n^ together 
wfth an indication of the type of content those transfoonaBons have been applied to. 
,0 20 Amethodaccording.oc,aim19whendependentupondalm13,wherein,hestepof 
combining the contentportlons as de«ned in c,aim3further^pnses^^^^^ ^^ 

applying content transformations according to the record of t^"^""" 
„«her cl^m portion so as to consistently apply trans,orma«ons to the same type of 
content as indicated in the record of transfomiations. 

21 A computer program or suite of programs so arranged such that when ^^^^ ' 
irisyLmMheycause/s the system to perfom, the method Of any Of daims11 

to 20. 

30 22.Amodu,atedcarrierslgnai.np=.poratingdatacor^pondingtothecomputerprogramor 
at least one of the suite of programs of daim 21. 
23. A computer readable storage medium storing a computer program or at least one of 
suite of computer programs according to daim 21. 
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ABSTRACT ^^^^^^^ Adaptation Process and System 



.e;pa,eoo.enUo.aMa,ons.a«enn.enaed..^^^^^^ ^^^^^ ^ 

o, content over- a number of smaller pages. The appals a 
procedure whicMntegrates the process o,sp,ltt,ng he cont nt.^^^^^^^^ 

.0 (tor example, reducing the font s.e, Images -^-^^^'^Ze:. recursively 
procedure is carried out systemat,ca,,y over ?"*;^„^^,1'„^,3,, .nernating this 



similar objects. 
[Figure 6] 
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